Abundant, well distributed and hyperpolymorphic simple sequence repeats in prokaryote genomes and use of same for prokaryote classification and typing

ABSTRACT

A method is provided for classifying or typing a prokaryote to a class or a type. The method is effected by characterizing at least one polymorphic simple sequence repeat locus in a genome of the prokaryote and, based on a characterization of the polymorphic simple sequence repeat, classifying or typing the prokaryote to a class or a type. Compounds and articles of manufacture are provided for effecting the method.

FIELD AND BACKGROUND OF THE INVENTION

[0001] The present invention relates to classification and typing ofprokaryotes and, more particularly, to abundant, well distributed andhyperpolymorphic simple sequence repeats in prokaryote genomes and theuse of same for prokaryote classification and typing.

[0002] Simple sequence repeats (SSRs) are a class of short sequences,usually of 1-6 nucleotides, that are tandemly (i.e., head to tail)repeated from two or three up to a few dozen times at a locus (Vogt1990). SSRs long have been known to be distributed throughout thegenomes of eukaryotes and to be highly polymorphic (Tautz 1989, Weber1990, Kashi et al. 1990). Polymorphisms arise primarily throughslipped-strand mispairing during DNA replication (Strand et al. 1994,Tautz and Schlotterer 1994). There is accumulating evidence that SSRsserve a functional role, affecting gene expression (Kunzler et al. 1995,Kashi et al. 1997, King et al. 1997, Kashi and Soller 1998).

[0003] The sequencing of complete genomes of many prokaryotes presentedthe opportunity to screen such genomes for the existence of SSRs (Fieldand Wills 1996, 1998), revealing arrays not detected in earlier studies.Recent publication of the complete genome sequence for Escherichia coli(Blattner et al. 1997) provides the basis for characterization of itsSSR arrays, both at a gross genomic level and at particular SSR loci.

[0004] Present-day approaches for typing of prokaryotes include growthin selective media, binding of specific antibodies, and amplification ofDNA using the polymerase chain reaction. For example, conventionalmethods for detection of E. coli (Vanderzant and Spittstoesser 1992)include enrichment and isolation with selective or indicator media, suchas E. coli (EC) broth, lauryl sulfate tryptose4-methylumbeliferyl-β-D-glucaronic acid broth, eosin methylene blueagar, and McConkey sorbitol agar. Procedures based on use of such medialead to identification of E. coli in a sample and estimation of number,but lack the ability to distinguish among E. coli strains. Hence, theentire process of strain identification remains difficult andtime-consuming. Recent methods for identification of E. coli strains useantibodies or nucleic acid sequences that uniquely bind to a particularstrain or group of strains. Several methods for immunological detectionor biochemical identification of the toxin-producing E. coli strain0157:H7 have been described (Farmer and Davis 1985, March and Ratnam1986, Kleanthous et al, 1988, Smith and Scotland 1988, Todd et al. 1988,Karmali 1989, Padbye and Doyle 1991, Tyler et al. 1991). However, theseassays do not distinguish among the various members of other serogroups.DNA amplification-based assays have been reported (Karch and Meyers1989, Pollard et al. 1990, Johnson et al 1990, Johnson et al. 1991,Jackson 1991, Yu and Kaper 1992, Witham et al. 1996), but mostly havelimitations including lengthy post-amplification detection protocols orlack of template quantification.

[0005] DNA sequence determination, on the other hand, is much moresimple and accurate.

[0006] There is thus a widely recognized need for, and it would behighly advantageous to have, a simple and rapid DNA sequence basedtechnique for the classification and typing of prokaryotes.

[0007] While conceiving the present invention it was assumed thatprokaryotes SSRs might be polymorphic and that such polymorphism mightbe class and type correlated and, if indeed exists, could be used toprovide a simple tool for the presently labor-intensive and complicatedtask of classification and typing of prokaryotes.

[0008] While reducing the present invention to practice, lengthpolymorphism was shown at two mononucleotide SSR loci in E. coli. Theexistence of thousands of SSR arrays in E. coli and in a wide range ofother prokaryotes that should exhibit hypervariability is shown as well.Interestingly, these SSR sites exhibit an upper size limit of 12 bp,suggesting selective mechanisms that might impose this size limit.

SUMMARY OF THE INVENTION

[0009] According to one aspect of the present invention there isprovided a method of classifying or typing a prokaryote to a class or atype, the method comprising the step of characterizing at least onepolymorphic simple sequence repeat locus in a genome of the prokaryoteand, based on a characterization of the polymorphic simple sequencerepeat, classifying or typing the prokaryote to a class or a type.

[0010] According to another aspect of the present invention there isprovided a pair of polymerase chain reaction primers having a sequenceadapted for exponential amplification of a polymorphic simple sequencerepeat locus in a genome of a prokaryote.

[0011] According to yet another aspect of the present invention there isprovided a polymerase chain reaction product derived by amplifying aportion of the genome using the pair of polymerase chain reactionprimers described above.

[0012] According to still another aspect of the present invention thereis provided an allele specific oligonucleotide comprising a sequence ofnucleotides adapted for effectively hybridizing only with a specificsimple sequence repeat of a polymorphic simple sequence repeat locus ina genomie of a prokaryote, under stringent allele specificoligonucleotide hybridization conditions of (i) a hybridization solutionof 2×standard sodium citrate (SSC) and 0.1% sodium dodecyl sulfate(SDS); (ii) a hybridization temperature of from 42° C. to Tm −5° C. for30 minutes to overnight, wherein Tm is estimated as 2×(the number of Aplus T residues) +4×(the number of G plus C residues); and (iii) posthybridization washes with 0.75×SSC and 0.1% SDS at a temperature from42° C. to Tm −5° C. For further details see Bult, C. J., et al., whichis incorporated by reference as if fully set forth herein.

[0013] According to still an additional aspect of the present inventionthere is provided a DNA chip comprising a surface and a plurality ofallele specific oligonucleotides attached thereto, each of the pluralityof allele specific oligonucleotides including a sequence of nucleotidesadapted for effectively hybridizing only with a specific simple sequencerepeat of a polymorphic simple sequence repeat locus in a genome of aprokaryote, under stringent hybridization conditions as described above.Preferably, the sequence of nucleotides is perfectly complementary tothe specific simple sequence repeat.

[0014] According to an additional aspect of the present invention thereis provided a hybrid of the allele specific oligonucleotide describedabove and the specific simple sequence repeat.

[0015] According to yet additional aspect of the present invention thereis provided a primer having a sequence adapted for amplification of apolymorphic simple sequence repeat locus in a genome of a prokaryote.

[0016] According to further features in preferred embodiments of theinvention described below, characterizing the at least one polymorphicsimple sequence repeat locus in the genome of the prokaryote is effectedby an allele specific oligonucleotide hybridization.

[0017] According to still further features in the described preferredembodiments characterizing the at least one polymorphic simple sequencerepeat locus in the genome of the prokaryote is effected by a polymerasechain reaction.

[0018] According to still further features in the described preferredembodiments characterizing the at least one polymorphic simple sequencerepeat locus in the genome of the prokaryote is effected by a sequencingreaction.

[0019] According to still further features in the described preferredembodiments characterizing the at least one polymorphic simple sequencerepeat locus in the genome of the prokaryote is effected by aheteroduplex hybridization reaction.

[0020] According to still further features in the described preferredembodiments characterizing the at least one polymorphic simple sequencerepeat locus in the genome of the prokaryote is effected by singlestrand conformational polymorphism.

[0021] According to still further features in the described preferredembodiments characterizing the at least one polymorphic simple sequencerepeat locus in the genome of the prokaryote is effected by restrictionfragment length polymorphism.

[0022] According to still further features in the described preferredembodiments the at least one polymorphic simple sequence repeat locus isin a non-coding region of the genome.

[0023] According to still further features in the described preferredembodiments the prokaryote is of the genus Escherichia.

[0024] According to still further features in the described preferredembodiments the prokaryote is Escherichia coli.

[0025] According to still further features in the described preferredembodiments the prokaryote is of a genus selected from the groupconsisting of Aquifex, Treponema, Bacillus, Listeria and Mycobacterium.

[0026] According to still further features in the described preferredembodiments the prokaryote is selected from the group consisting ofAquifex aeolicus, Freponerna pallidum, Bacillus subtilis, Listeriamonocytogenes and Mycobacterium tuberculosis.

[0027] According to still further features in the described preferredembodiments the prokaryote is of a genus selected from the groupconsisting of Haemnophilius, Mycoplasma, Helicobacter, Methanococcus,Archaeoglobus and Synechocystis.

[0028] According to still further features in the described preferredembodiments the prokaryote is selected from the group consisting ofHaemophilius influenzae, Mycoplasma pneumoniae, Helicobacter pylori,Methanococcus jannaschii, Archatoglobus fulgidus and Synechocystis sp.

[0029] The present invention successfully addresses the shortcomings ofthe presently known configurations by providing a highly polymorphicgenetic tool for efficient rapid and accurate taxonomy of prokaryotes,which can be used for ultimate classification and typing.

BRIEF DESCRIPTION OF THE DRAWINGS

[0030] The invention herein described, by way of example only, withreference to the accompanying drawings, wherein:

[0031]FIGS. 1a-d show the lengths of microsatellite arrays at givenpositions (bp) in selected, completely sequenced genomes of Escherichiacoli, Bacillus subtilus, Archaeoglobus fulgidus and Saccharomycescerevesiae chromosome No 7, respectively.

[0032]FIG. 2 demonstrates size differences in PCR products harboringspecific SSR arrays among strains of E. coli, as shown followingelectrophoresis in a 5% acrylamide TBE denaturing sequencing gel. PCRwas performed using primer pairs, one radiolabeled, flanking the poly-Ctract at a genomic site upstream from the ATG site of the ycgW locus.The dried gel was exposed to a phosphoimager. The expected size for theK12 amplification product was about 200 bp. Shown in each lane areamplification products for the following strains: substrains (1)K12:DH5α, (2) B:SR9b, (3) B:SR9c, (4) ETEC:078:H [E10407], (5) EPEC:Oll1[E639616], (6) E:1, (7) E:7, (8) E:18, (9) E:47.

[0033]FIGS. 3a-b show DNA sequence alignments for complementary DNAstrands for two loci bearing mononucleotide repeat polymorphisms instrains of E. coli. PCR products were sequenced using the dideoxy-chaintermination method, and aligned using the “Pile-up” program of theGenetics Computing Group version 9, default option for routine“pile-up”, wherein consensus sequences determined by the routine called“pretty” was set so that any difference among sequences was regarded aslack of consensus SSR arrays are shown in bold letters, and the TATA boxwhere shown, is underlined. FIG. 3a—Poly-C tract of ycgW FIG. 3b Poly-Gand Ploy-T tracts ofyaiN.

[0034]FIG. 4 is a schematic depiction of a DNA chip 10 according to thepresent invention, including an array of locations, each of whichincludes an allele specific oligonucleotide attached thereto.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0035] The present invention is of a method of classifying or typing aprokaryote to a class or a type which can be used in research andmedical and food safety diagnostics. Specifically, the present inventioncan be used to type strains and substrains of prokaryotes into classesand types according to the established taxonomy by associating specificmorphs of highly polymorphic simple sequence repeat loci with suchclasses and types. The invention is further of articles of manufacture,such as DNA chips, and other single nucleotide polymorphism (SNP)-basedcompounds, typically oligonucleotides or primers, useful in efficientlyimplementing the method according to the present invention for researchand medical and food safety diagnostics.

[0036] The principles and operation of the present invention may bebetter understood with reference to the drawings and accompanyingdescriptions.

[0037] Before explaining at least one embodiment of the invention indetail, it is to be understood that the invention is not limited in itsapplication to the details of construction and the arrangement of thecomponents set forth in the following description or illustrated in thedrawings The invention is capable of other embodiments or of beingpracticed or carried out in various ways. Also, it is to be understoodthat the phraseology and terminology employed herein is for the purposeof description and should not be regarded as limiting.

[0038] Although the existence of simple sequence repeat (SSR) DNA arraysin eukaryptic genomes long has been known, their existence inprokaryotic genomes only recently has been recognized. While reducingthe present invention to practice, analysis of the published DNAsequence of E. coli strain K12 revealed tens of thousands of welldistributed SSR arrays. PCR primer pairs were then developed for severalSSR loci, and polymorphisms was found at two mononucleotide SSR loci.These results suggest that SSRs provide a ready source of geneticvariability to be exploited by evolution and which is useful for avariety of applications as further detailed hereinunder. Polymorphismsdifferentiate non-pathogenic and pathogenic groups of E. coli strains,suggesting the utility of SSR typing for rapid, PCR- oroligonucleotide-based detection of pathogenic strains for diagnostics(medical and food safety) or for epidemiological studies. Analysis ofpublished complete genome sequences of E. coli and 10 additionalprokaryotes showed the existence of thousands of mono-, di-, tri-, andtetranucleotide microsatellite arrays, mostly in non-coding sequences. Ageneral limit of 12 nucleotides in SSR array size was observed,suggesting the effect of a selective mechanism limiting array size.Because SSR polymorphisms in promoter regions could alter geneexpression, it is inferred that such polymorphisms are an importantfactor in prokaryotic evolution and could be used by biotechnologists topurposefully “fine-tune” gene regulation in a microorganism of interest.

[0039] SSR content of E. coli: SSRs in E. coli, a gram-negativebacterium of interest as a model species and as a human pathogen, werecharacterized. The results show that SSR arrays in E. coli are numerousand diverse-in terms of core sequence and repeat number. As furtherdiscussed below, SSRs in E. coli are also polymorphic. Findings ofprevious studies reflect the growing understanding of SSRs in E coli.SSR regions in E. coli were thought to be very rare, limited to only afew dinucleotide SSRs with a maximum of five repeat units per locus (vanBelkum et al. 1998). Field and Wills (1998) presented data showing theexistence of hundreds of thousands of mononucleotide SSRs in E. coli,and mentioned SSRs with core lengths varying from 1 to 6 bp.

[0040] Polymorphism of SSRs in E. coli: The data collected whilereducing the present invention to practice show repeat number variationat two out of four mononucleotide SSR loci examined in E. coli, andsuggest that mononucleotide SSR loci may prove a source of thousands ofpolymorphisms for marking its genome. In contrast, polymorphisms werenot observed at SSR loci with two or more nucleotide core sequences,however, this may be due to the small sample size employed.

[0041] Genomic locations of SSRs in E. coli: The findings reportedherein indicate that SSR arrays in E. coli tend to be located 5′ or 3′relative to coding sequences of genes. Similarly, in eukaryotes, arraysof certain types of repetitive DNA have been located to the 5′ or 3′flanking regions of genes. These repeats may be involved in nucleosomeorganization, recombination, or regulation of gene expression or geneproduct activity (Tripathi and Bramachani 1977, Kashi et al. 1997, Kinget al. 1997, Kashi and Soller 1988). This also has been inferred for E.coli (Rosenberg et al. 1994). Sections of microbial genomic DNA bearingvisibly different frequencies of tetranucleotide repeats than otherparts of the genome contained ribosomal RNA, bacteriophage, or undefinedcoding regions (Noble et al. 1998).

[0042] SSRs in other prokaryotes: The presence of SSRs in genomes ofseveral prokaryotes has been demonstrated in recent studies, and someSSR loci have been shown to exhibit length polymorphisms (reviewed byvan Belkum et al. 1998), yet these SSRs are restricted to protein codingsequences. Variation was shown in trinucleotide repeats of very largearray size in simple eukaryotes (Field and Wills 1996). The existence ofmononucleotide microsatellites was subsequently shown in eightprokaryotic genomes, however, polymorphism was not persuaded oranticipated (Field and Wills 1998). For genomes surveyed in bothstudies, lo the results shown herein agree very closely with those ofField and Wills (1998). The existence of SSR arrays with 1-6 bp coresequences in 10 additional prokaryotes, including four (Aquifexaeolicus, Treponema pallidum, Bacillus subtilis, and Mycobacteriumtuberculosis) not previously surveyed is reported herein.

[0043] Thousands of SSRs with a range of core repeat sizes were observedin all genomes examined (see Table 1 in the Examples section thatfollows). The results presented herein differ in these respects fromcertain earlier findings. van Belkum et al. (1998) reported that onlydinucleotide SSR candidates were detected in Methanococcus jannaschii,while SSRs with core sequences of one to six nucleotides observed. vanBelkum et al. (1997b) found only 18 potential SSRs in the genome ofHemophilus influenzae, and Scherer and van Belkum (unpublished datacited in Van Belkum et al. 1998) found no perfectly repetitive DNAsequence motifs in the genome of Mycobacterium tuberculosis. Differencesin the search algorithms for the software packages that the respectivegroups used may explain the different findings.

[0044] The algorithm used herein, which is available athttp://www.technion.ac.il/pub/supported/biotech. In the first step, theuser defines the minimum length and number of repeats of the coresequence, and hence, the minimum length of the SSR sought. The programthen scans the genomic sequence at issue. When it reads a motif suitingthe parameters specified, it checks whether this motif is containedwithin a repeated sequence by comparing its sequence with that ofnucleotides following it. If the motif is indeed repeated, the programcounts the number of repeats. The program will write the core sequence,the number of repeats, and the genomic location of the SSR in an outputfile. The program then will continue to move through the genomicsequence at issue, continuing in such fashion until the entire genomehas been read and evaluated. This differs from earlier programs ofsimilar intent because it both searches for all SSRs with repeat lengthsup to a defined number of base pairs in length and records the genomiclocation of each repeat motif found.

[0045] Size limit of SSR arrays: Analysis of SSR tracts in genomic DNAsequences for E. coli and a wide variety of other prokaryotes shows thatSSR array length is small relative to eukaryotes. The length of SSRtracts is determined by interacting processes of mutation and selection.Mutation at SSR loci is believed to be the consequence of slipped-strandmispairing during DNA replication (Strand et al. 1994, Tautz andSchlotterer 1994). The tertiary structure of repetitive DNA allowsmismatching of neighboring repeats, and depending on the strandorientation, repeats can be inserted or detected during DNApolymerase-mediated DNA duplication (Coggins and O'Prey 1989), Hauge andLitt 1993, Chiurazzi et al. 1994). High mutation SSRS result when theDNA repair system is non-functional. The resulting mutations are notalways repaired by DNA mismatch repair mechanisms. (Strand et al. 1993).Additionally, polyAC/TG repeats are destabilized by mutations thatinduce SOS response in E. coli (Morel et al. 1998).

[0046] The data set presented herein exhibits certain repeat arraylengths more frequently than others (see Table 1 in the Examples sectionthat follows). Frequencies of particular SSR array lengths in E. coli donot release monotonically from 3 to 18. There are more arrays of 9 bp inlength than of 7 or 8, and more of 12 bp in length than of 10 or 11.Array lengths with odd numbers of nucleotides greater than five are lesscommon than arrays with comparable, even numbers of nucleotides. Theseobservations run counter to expectations due to particular SSR sequencesby chance.

[0047] It is hypothesized that the frequencies of arrays exhibitingparticular length are, in part, the consequence of the number of ways inwhich those particular repeat lengths can arise. An SSR repeat tractwith a length of twelve bp can, rise following five types of mutations:a mononucleotide may be repeated 12 times, a dinucleotide six times, atrinucleotide four times, a tetranucleotide three times, or ahexanucleotide twice. However, certain SSR structures of 12 bp in lengthare more common than others. In E. coli No tracts were found where asingle nucleotide was repeated 12 times, one where a dinucleotide wasrepeated six times, 54 where a trinucleotide was repeated four times, 47where a tetranucleotide was repeated three times, and none wherehexanucleotides were repeated twice. That SSR arrays of 12 bp in lengthoccur 102 times, while arrays of 10 bp in length occur only 21 times,may be explained, in part, by there being only three ways by whichmutation will give rise to an array of 10 bp; i.e., a mononucleotide maybe repeated 10 times (one occurrence), a dinucleotide may be repeatedfive times (20 occurrences), or a pentanucleotide may be repeated twice(none). SSR arrays of nine nucleotides can arise two ways, either amononucleotide repeated nine times (17 occurrences), or a trinucleotidecan be repeated three times (2,013 occurrences). SSR arrays greater than12 bp in length are not expected in the genome by chance alone, yet afew occur (two occurrences of a trinucleotide repeated five times, oneof a tetranucleotide repeated four times, and two of a hexanucleotiderepeated three times). Hence, not all mutations are equally likely, or aselective factor must be invoked to explain frequencies of SSRs ofdifferent structures. Although each prokaryote has its own distributionof SSR array sizes (Table 1, FIG. 1), in general, the upper limit forthe size of SSR arrays is 12 bp. This suggests that the tendency forrepeat length at a locus to rise via mutation is counteracted byselection, which affects the distribution of combinations of core lengthand repeat number, and which holds total SSR array length under 12 bp.This supports the hypothesis (Kashi et al 1997, King et al. 1997) thatsome SSRs provide a source of variation that affects gene expressionand, hence, is subject to selection.

[0048] Screening of SSRs among strains of E. coli showed seven sitesexhibiting no polymorphism. The rate of mutations giving rise to SSRvariation should be the same across a genome. Therefore, wherever thereis a site that is totally conserved, one may infer that selection isoperating at that locus. Hence, it is believed that monomorphic SSRsites are under strong selection, which will not tolerate mutations inthese loci.

[0049] The distribution of SSR array sizes in yeast (Table 1, FIG. 1), aeukaryote, is different from those in prokaryotes. The distribution ofSSR sizes includes longer core sequences and larger repeat numbers thanin prokaryotic genomes. The process of generating polymorphism of SSRs,slipped strand replication, is expected to have roughly similarfrequency and effects in prokaryotes and eukaryotes. The difference inthe distribution of SSR array sizes may be attributable to a relaxedselective regime within eukayotic genomes, which tolerate the presenceof more “junk” DNA than prokaryotic genomes.

[0050] Practical utility of SSR polymorphisms: SSR polymorphism offers auseful tool for analysis of prokaryotic genomes. SSRs are widely used asa means for developing DNA fingerprints for eukaryotes. To characterizeSSR polymorphisms in eukaryotes, large fragments have to be sizedaccurately, by DNA sequencing gels or by electrophoresis of PCRamplification products, necessarily involving the technical question ofwhether a gel supported accurate electrophoresis and thereby providedreliable results. In contrast, polymorphic mononucleotide sites thatwere found in E. coli exhibit 1 to 4 bp size differences. These smallnumbers of repeats are well suited for the development of SSRallele-specific oligonucleotides (ASOs) There are many existing andfuture methods for developing ASOs or SNPs that may be used to identifypolymorphisms of SSRS These ASOs or SNPs may be used as PCR primers, oras hybridization probes that can be spotted on the surface of a DNA chipfor rapid, automated characterization of variation at a set of givenloci for each E coli genome. Upon preparation of PCR probes or a DNAchip, polymorphisms of prokaryotic SSRs may be screened to support avariety of research and diagnostic applications.

[0051] It is especially interesting that E. coli is part of the normalhuman microflora, but there are pathogenic strains that have to bedistinguished and rapidly detected. The use of SNPs, single nucleotidepolymorphisms, is the preferred rising technique that can be used toautomatically screen polymorphism. Affymetix, Inc. has developed a DNAchip for implementing that technique. To this end, see U.S. Pat. Nos.5,843,655 for “Methods for testing oligonucleotide arrays”; 5,837,832for “Arrays of nucleic acid probes on biological chips”; 5,834,758 for“Method and apparatus for imaging a sample on a device”; 5,831,070 for“Printing oligonucleotide arrays using deprotection agents solely in thevapor phase”; 5,770,722 for “Surface-bound, unimolecular,double-stranded DNA”; 5,770,456 for “Cyclic nucleic acid and polypeptidearrays”; 5,753,788 for “Photolabile nucleoside protecting groups”;5,744,305 for “Arrays of materials attached to a substrate”; 5,733,729for “Computer-aided probability base calling for arrays of nucleic acidprobes on chips”; 5,710,000 for “Capturing sequences adjacent toType-IIs restriction sites for genomic library mapping”; 5,631,734 for“Method and apparatus for detection of fluorescently labeled materials”;5,599,695 for “Printing molecular library arrays using deprotectionagents solely in the vapor phase”; and 5,593,839 for “Computer-aidedengineering system for design of sequence arrays and lithographicmasks”, which are incorporated by reference as if fully set forthherein.

[0052] The polymorphic and small SSRs (12 nucleotides or less) reportedherein can take advantage in combining the DNA chip technology with theASO technology for their detection. However, as other rapid methods maybe developed, the scope of the present invention is not limited to DNAchip technology.

[0053] Screenings of SSR variation may provide the basis for rapid andsensitive identification of E. coli strains, and sometimes may proveadvantageous over existing serotyping or molecular genetic methods,distinguishing strains or subdividing strains to subgroups. For example,SSR variation may prove useful for rapidly and sensitivelycharacterizing pathogenic strains of E. coli, such as O157:H7, from morecommon strains. Similarly, knowledge of SSR variation in otherpathogenic microbes, such as Haemophilus influenzae (van Belkum et al.1997a), Candida albicans (Bretagne et al. 1997), Bacteroides fragilisand B. thetaiotaomicron (Claros et al. 1997), and Helicobacter pylori(Marshall et al. 1996), might be applied for rapid detection and straincharacterization.

[0054] SSRs can be screened to determine to what extent molecularvariation gives rise to phenotypic variation. For example, SSRvariability poses clear implications with regard to virulence. Motifs ofSSRs have been found within suspected or confirmed virulence genes ofHemophilus influenzae (Karlin et al. 1996); Neisseria sp., Hemophilusparainfluenzae, and Moraxella catarrhalis (Peak et al. 1996), and repeatnumber variation seems to he related to modulation of the expression ofvirulence factors. Contingency genes containing SSRs exhibit highmutation rates, allowing the bacterium to respond rapidly uponencountering challenging environmental conditions (Moxon et al. 1994).Location of SSR repeat arrays by computerized search of the genomicsequence and localization of such arrays with regard to expressed genes,as reported herein, could provide a basis for searching for newvirulence-related loci in E. coli.

[0055] SSR polymorphism can be used for epidemiological purposes, forexample, to determine whether a pathogenic E. coli strain detected in apatient matches a potential source of a given disease outbreak. SSRshave been used as markers for such purposes for several pathogenicmicrobes. For example, repetitive DNA elements of Mycobacteriumtuberculosis have been used for efficient strain tracking (Van Soolingenet al. 1993). Epidemiologically informative microsatellite DNApolymorphisms have been observed in different strains of Helicobacterpylori (Marshall et al. 1996). SSR variation has been used to identifythe strains of Haemophilus influenzae isolated from different patients(van Belkum et al. 1997a). To demonstrate a similar approach in E. coli,a series of allelic SSR markers distinguishing relevant strains can bedeveloped. The observation reported herein of polymorphism within the Bstrain between SR9b and SR9c suggests that SSR markers might even provecapable of resolving variation within strains.

[0056] Many SSR arrays are found in the promoter regions of genes,affecting the expression of genes in a way tolerated by the hostbacterium (Kashi et al 1997, King et al. 1997). The results shown hereinrepresent a 100 kb sample of the E. coli genome, and should berepresentative of the entire genome thereof. Constancy of dinucleotiderelative abundance profiles has been shown over multiple 50 kb disjointcontigs within the same genome in E. coli and 14 other prokaryotes(Karlin et al. 1997). This source of phenotypic variation can beexploited not only by natural selection, but also by biotechnologists tomanipulate bacterial genomes to express a desired phenotype.

[0057] The detection and characterization of specific nucleic acidsequences and sequence changes have been utilized to detect the presenceof viral or bacterial nucleic acid sequences indicative of an infection,the presence of variants or alleles of mammalian genes associated withdisease and cancers, and the identification of the source of nucleicacids found in forensic samples, as well as in paternity determinations.

[0058] Various methods are known in the art which may be used to detectand characterize specific nucleic acid sequences and sequence changes.Nonetheless, as nucleic acid sequence data of the human genome, as wellas the genomes of pathogenic organisms accumulates, the demand for fast,reliable, cost-effective and user-friendly tests for specific sequencescontinues to grow. Importantly, these tests must be able to create adetectable signal from a very low copy number of the sequence ofinterest. The following discussion examines three levels of nucleic aciddetection currently in use: (i) signal amplification technology fordetection of rare sequences; (ii) direct detection technology fordetection of higher copy number sequences; and (iii) detection ofunknown sequence changes for rapid screening of sequence changesanywhere within a defined DNA fragment.

[0059] Signal Amplification Technology Methods for Amplification

[0060] The “Polymerase Chain Reaction” (PCR) comprises the firstgeneration of methods for nucleic acid amplification. However, severalother methods have been developed that employ the same basis ofspecificity, but create signal by different amplification mechanisms.These methods include the “Ligase Chain Reaction” (LCR), “Self-SustainedSynthetic Reaction” (3SR/NASBA), and “Qβ-Replicase” (Qβ).

[0061] The polymerase chain reaction (PCR), as described in U.S. Pat.Nos. 4,683,195 and 4,683,202 to Mullis and Mullis et al., describe amethod for increasing the concentration of a segment of target sequencein a mixture of genomic DNA without cloning or purification. Thistechnology provides one approach to the problems of low target sequenceconcentration. PCR can be used to directly increase the concentration ofthe target to an easily detectable level. This process for amplifyingthe target sequence involves introducing a molar excess of twooligonucleotide primers which are complementary to their respectivestrands of the double-stranded target sequence to the DNA mixturecontaining the desired target sequence. The mixture is denatured andthen allowed to hybridize. Following hybridization, the primers areextended with polymerase so as to form complementary strands. The stepsof denaturation, hybridization, and polymerase extension can be repeatedas often as needed, in order to obtain relatively high concentrations ofa segment of the desired target sequence.

[0062] The length of the segment of the desired target sequence isdetermined by the relative positions of the primers with respect to eachother, and, therefore, this length is a controllable parameter. Becausethe desired segments of the target sequence become the dominantsequences (in terms of concentration) in the mixture, they are said tobe “PCR-amplified.”

[0063] The ligase chain reaction (LCR; sometimes referred to as “LigaseAmplification Reaction” (LAR) described by Barany, Proc. Natl. Acad.Sci., 88:189 (1991); Barany, PCR Methods and Applic., 1:5 (1991); and Wuand Wallace, Genomics 4:560 (1989) has developed into a well-recognizedalternative method for amplifying nucleic acids. In LCR, fouroligonucleotides, two adjacent oligonucleotides which uniquely hybridizeto one strand of target DNA, and a complementary set of adjacentoligonucleotides, which hybridize to the opposite strand are mixed andDNA ligase is added to the mixture. Provided that there is completecomplementarity at the Junction, ligase will covalently link each set ofhybridized molecules. Importantly, in LCR, two probes are ligatedtogether only when they base-pair with sequences in the target sample,without gaps or mismatches. Repeated cycles of denaturation,hybridization and ligation amplify a short segment of DNA. LCR has alsobeen used in combination with PCR to achieve enhanced detection ofsingle-base changes. Segev, PCT Public. No. WO90,01069 A1 (1990).However, because the four oligonucleotides used in this assay can pairto form two short ligatable fragments, there is the potential for thegeneration of target-independent background signal. The use of LCR formutant screening is limited to the examination of specific nucleic acidpositions.

[0064] The self-sustained sequence replication reaction (3SR) (Guatelliet al., Proc. Natl. Acad. Sci., 87:1874-1878, 1990, with an erratum atProc. Natl. Acad. Sci., 87:7797, 1990) is a transcription-based in vitroamplification system (Kwok et al., Proc. Natl. Acad. Sci., 86:1173-1177,1989) that can exponentially amplify RNA sequences at a uniformtemperature. The amplified RNA can then be utilized for mutationdetection (Fahy et al., PCR Meth. Appl., 1:25-33, 1991). In this method,an oligonucleotide primer is used to add a phage RNA polymerase promoterto the 5′ end of the sequence of interest. In a cocktail of enzymes andsubstrates that includes a second primer, reverse transcriptase, RNaseH, RNA polymerase and ribo-and deoxyribonucleoside triphosphates, thetarget sequence undergoes repeated rounds of transcription, cDNAsynthesis and second-strand synthesis to amplify the area of interestThe use of 3SR to detect mutations is kinetically limited to screeningsmall segments of DNA (e.g., 200-300 base pairs).

[0065] In Q-Beta (Qβ), a probe which recognizes the sequence of interestis attached to the replicatable RNA template for Qβ replicase. Apreviously identified major problem with false positives resulting fromthe replication of unhybridized probes has been addressed through use ofa sequence-specific ligation step. However, available thermostable DNAligases are not effective on this RNA substrate, so the ligation must beperformed by T4 DNA ligase at low temperatures (37° C.). This preventsthe use of high temperature as a means of achieving specificity as inthe LCR, the ligation event can be used to detect a mutation at thejunction site, but not elsewhere.

[0066] A successful diagnostic method must be very specific. Astraight-forward method of controlling the specificity of nucleic acidhybridization is by controlling the temperature of the reaction. Whilethe 3SRINASBA, and Qβ systems are all able to generate a large quantityof signal, one or more of the enzymes involved in each cannot be used athigh temperature (i.e., >55° C.). Therefore the reaction temperaturescannot be raised to prevent non-specific hybridization of the probes. Ifprobes are shortened in order to make them melt more easily at lowtemperatures, the likelihood of having more than one perfect match in acomplex genome increases. For these reasons, PCR and LCR currentlydominate the research field in detection technologies.

[0067] The basis of the amplification procedure in the PCR and LCR isthe fact that the products of one cycle become usable templates in allsubsequent cycles, consequently doubling the population with each cycle.The final yield of any such doubling system can be expressed as:(1+X)^(n)=y, where “X” is the mean efficiency (percent copied in eachcycle), “n” is the number of cycles, and “y” is the overall efficiency,or yield of the reaction (Mullis, PCR Methods Applic., 1: 1, 1991).

[0068] Many applications of nucleic acid detection technologies, such asin studies of allelic variation, involve not only detection of aspecific sequence in a complex background, but also the discriminationbetween sequences with few, or single, nucleotide differences. Onemethod for the detection of allele-specific variants by PCR is basedupon the fact that it is difficult for Taq polymerase to synthesize aDNA strand when there is a mismatch between the template strand and the3′ end of the primer. An allele-specific variant may be detected by theuse of a primer that is perfectly matched with only one of the possiblealleles; the mismatch to the other allele acts to prevent the extensionof the primer, thereby preventing the amplification of that sequence.

[0069] A similar 3′-mismatch strategy is used with greater effect toprevent ligation in the LCR (Barany, PCR Meth. Applic, 1:5, 1991). Anymismatch effectively blocks the action of the thermostable ligase, butLCR still has the drawback of target-independent background ligationproducts initiating the amplification. Moreover, the combination of PCRwith subsequent LCR to identify the nucleotides at individual positionsis also a clearly cumbersome proposition for the clinical laboratory.Direct Detection Technology

[0070] When a sufficient amount of a nucleic acid to be detected isavailable, there are advantages to detecting that sequence directly,instead of making more copies of that target, (e.g., as in PCR and LCR).Most notably, a method that does not amplify the signal exponentially ismore amenable to quantitative analysis. Even if the signal is enhancedby attaching multiple dyes to a single oligonucleotide, the correlationbetween the final signal intensity and amount of target is direct. Sucha system has an additional advantage that the products of the reactionwill not themselves promote further reaction, so contamination of labsurfaces by the products is not as much of a concern. Traditionalmethods of direct detection including Northern and Southern blotting andRNase protection assays usually require the use of radioactivity and arenot amenable to automation. Recently devised techniques have sought toeliminate the use of radioactivity and/or improve the sensitivity inautomatable formats. Two examples are the “Cycling Probe Reaction”(CPR), and “Branched DNA” (bDNA).

[0071] The cycling probe reaction (CPR) (Duck et al., BioTech., 9:142,1990), uses a long chimeric oligonucleotide in which a central portionis made of RNA while the two termini are made of DNA. Hybridization ofthe probe to a target DNA and exposure to a thermostable RNase H causesthe RNA portion to be digested This destabilizes the remaining DNAportions of the duplex, releasing the remainder of the probe from thetarget DNA and allowing another probe molecule to repeat the process.The signal, in the form of cleaved probe molecules, accumulates at alinear rate. While the repeating process increases the signal, the RNAportion of the oligonucleotide is vulnerable to RNases that may carriedthrough sample preparation.

[0072] Branched DNA (bDNA), described by Urdea et al., Gene 61:253-264(1987), involves oligonucleotides with branched structures that alloweach individual oligonucleotide to carry 35 to 40 labels (e.g., alkalinephosphatase enzymes). While this enhances the signal from ahybridization event, signal from non-specific binding is similarlyincreased.

[0073] Detection of Unknown Sequence Changes

[0074] The demand for tests which allow the detection of specificnucleic acid sequences and sequence changes is growing rapidly inclinical diagnostics. As nucleic acid sequence data for genes fromhumans and pathogenic organisms accumulates, the demand for fast,cost-effective, and easy-to-use tests for as yet unknown mutationswithin specific sequences is rapidly increasing.

[0075] A handful of methods have been devised to scan nucleic acidsegments for mutations. One option is to determine the entire genesequence of each test sample (e.g., a bacterial isolate) For sequencesunder approximately 600 nucleotides, this may be accomplished usingamplified material (e.g., PCR reaction products). This avoids the timeand expense associated with cloning the segment of interest. However,specialized equipment and highly trained personnel are required, and themethod is too labor-intense and expensive to be practical and effectivein the clinical setting.

[0076] In view of the difficulties associated with sequencing, a givensegment of nucleic acid may be characterized on several other levels. Atthe lowest resolution, the size of the molecule can be determined byelectrophoresis by comparison to a known standard run on the same gel. Amore detailed picture of the molecule may be achieved by cleavage withcombinations of restriction enzymes prior to electrophoresis, to allowconstruction of an ordered map. The presence of specific sequenceswithin the fragment can be detected by hybridization of a labeled probe,or the precise nucleotide sequence can be determined by partial chemicaldegradation or by primer extension in the presence of chain-terminatingnucleotide analogs.

[0077] For detection of single-base or length differences between likesequences, the requirements of the analysis are often at the highestlevel of resolution. For cases in which the position of the nucleotideor nucleotides in question is known in advance, several methods havebeen developed for examining single base changes without directsequencing. For example, if a mutation of interest happens to fallwithin a restriction recognition sequence or between restrictionrecognition sequences when, a change in the pattern of digestion or gelmigration can be used as a diagnostic tool (e.g., restriction fragmentlength polymorphism (RFLP) analysis.

[0078] Single point mutations have been also detected by the creation ordestruction of RFLPs. Mutations are detected and localized by thepresence and size of the DNA fragments generated by cleavage at themismatches. Single nucleotide mismatches in DNA heteroduplexes are alsorecognized and cleaved by some chemicals, providing an alternativestrategy to detect single base substitutions, generically named the“Mismatch Chemical Cleavage” (MCC) (Gogos et al., Nucl. Acids Res.,18:6807-6817, 1990). However, this method requires the use of osmiumtetroxide and piperidine, two highly noxious chemicals which are notsuited for use in a clinical laboratory.

[0079] If the change is not in a recognition sequence, thenallele-specific oligonucleotides (ASOs), can be designed to hybridize inproximity to the unknown nucleotide, such that a primer extension orligation event can be used as the indicator of a match or a mis-match.Hybridization with radioactively labeled allelic specificoligonucleotides (ASO) also has been applied to the detection ofspecific point mutations (Conner et al., Proc. Natl. Acad. Sci.,80:278-282, 1983). The method is based on the differences in the meltingtemperature of short DNA fragments differing by a single nucleatide.Stringent hybridization and washing conditions can differentiate betweenmutant and wild-type alleles. The ASO approach applied to PCR productsalso has been extensively utilized by various researchers to detect andcharacterize point mutations in ras genes (Vogelstein et al., N. Eng. J.Med., 319:525-532, 1988; and Farr et al., Proc. Natl. Acad. Sci.,85:1629-1633, 1988), and gsp/gip oncogenes (Lyons et al., Science249:655-659, 1990). Because of the presence of various nucleotidechanges in multiple positions, the ASO method requires the use of manyoligonucleotides to cover all possible mutations.

[0080] With either of the techniques described above (i.e., RFLP andASO), the precise location of the suspected mutation must be known inadvance of the test. That is to say, they are inapplicable when oneneeds to detect the presence of a mutation of an unknown character andposition within a gene or sequence of interest.

[0081] Two other methods rely on detecting changes in electrophoreticmobility in response to minor sequence changes. One of these methods,termed “Denaturing Gradient Gel Electrophoresis” (DGGE) is based on theobservation that slightly different sequences will display differentpatterns of local melting when electrophoretically resolved on agradient gel. In this manner, variants can be distinguished, asdifferences in melting properties of homoduplexes versus heteroduplexesdiffering in a single nucleotide can detect the presence of mutations inthe target sequences because of the corresponding changes in theirelectrophoretic mobilities. The fragments to be analyzed, usually PCRproducts, are “clamped” at one end by a long stretch of G-C base pairs(30-80) to allow complete denaturation of the sequence of interestwithout complete dissociation of the strands. The attachment of a GC“clamp” to the DNA fragments increases the fraction of mutations thatcan be recognized by DGGE (Abrams et al., Genomics 7:463-475, 1990).Attaching a GC clamp to one primer is critical to ensure that theamplified sequence has a low dissociation temperature (Sheffield el al.,Proc. Natl. Acad. Sci., 86:232-236, 1989); and Lerman and Silverstein,Meth. Enzymol., 155:482-501, 1987). Modifications of the technique havebeen developed, using temperature gradients (Wartell et al, Nucl. AcidsRes., 18:2699-2701, 1990), and the method can be also applied to RNA:RNAduplexes (Smith et al., Genomics 3:217-223, 1988).

[0082] The long running time of DGGE was shortened in a modification ofDGGE called constant denaturant gel electrophoresis (CDGE) (Borrensen elal., Proc. Natl. Acad. Sci. USA 88:8405, 1991). CDGE requires that gelsbe performed under different denaturant conditions in order to reachhigh efficiency for the detection of unknown mutations.

[0083] A technique analogous to DGGE, termed temperature gradient gelelectrophoresis (TGGE), uses a thermal gradient rather than a chemcialdenaturant gradient (Scholz, el al., Hum. Mol. Genet. 2:2155, 1993).TGGE requires the use of specialized equipment which can generate atemperature gradient perpendicularly oriented relative to the electricalfield. TGGE can detect mutations in relatively small fragments of DNA.Therefore scanning of large gene segments requires the use of multiplePCR products prior to running the gel.

[0084] Another common method, called “Single-Strand ConformationPolymorphism” (SSCP) was developed by Hayashi, Sekya and colleagues(reviewed by Hayashi, PCR Meth Appl., 1:34-38, 1991) and is based on theobservation that single strands of nucleic acid can take oncharacteristic conformations in non-denaturing conditions, and theseconformations influence electrophoretic mobility. The complementarystrands assume sufficiently different structures that one strand may beresolved from the other. Changes in sequences within the fragment willalso change the conformation, consequently altering the mobility andallowing this to be used as an assay for sequence variations (Orita, etal., Genomics 5:874-879, 1989).

[0085] The SSCP process involves denaturing a DNA segment (e.g., a PCRproduct) that is labeled on both strands, followed by slowelectrophoretic separation on a non-denaturing polyacrylamide gel, sothat intra-molecular interactions can form and not be disturbed duringthe run. This technique is extremely sensitive to variations in gelcomposition and temperature. A serious limitation of this method is therelative difficulty encountered in comparing data generated in differentlaboratories, under apparently similar conditions.

[0086] The dideoxy fingerprinting (ddF) is another technique developedto scan genes for the presence of unknown mutations (Liu and Sommer, PCRMethods Appli., 4:97, 1994). The ddF technique combines components ofSanger dideoxy sequencing with SSCP. A dideoxy sequencing reaction isperformed using one dideoxy terminator and then the reaction productsare electrophoresised on nondenaturing polyacrylamide gels to detectalterations in mobility of the termination segments as in SSCP analysis.

[0087] According to one aspect of the present invention there isprovided a method of classifying or typing a prokaryote to a class or atype. The method according to the present invention is effected bycharacterizing at least onepolymorphic simple sequence repeat locus in agenome of the prokaryote and, based on a characterization of thepolymorphic simple sequence repeat, classifying or typing the prokaryoteto a class or a type.

[0088] It is shown hereinunder in the Examples section that simplesequence repeat loci are highly abundant among various prokaryotes ofdifferent genera. It is further shown hereinunder in the Examplessection that simple sequence repeat loci can be highly polymorphicwithin strains and substrains of the same prokaryote species. The methodaccording to the present invention takes advantage over these findingsto provide an effective, readily implamentable, rapid and accurate toolfor classifying or typing a prokaryote to a class or a type.

[0089] As described above, the art of molecular biology provides aplurality of experimental protocols and devices dedicated atcharacterizing sequences. Many such protocols and devices were developedto assist in rapid characterization of polymorphic loci in eukaryotes,human beings in particular, typically to assist in pre- and postnataldetection and/or diagnosis of genetic disorders.

[0090] These methods and devices can be efficiently implemented incombination with the method according to the present invention to assistin characterizing polymorphic simple sequence repeat loci in the genomeprokaryotes.

[0091] Characterizing polymorphic simple sequence repeat loci in thegenome prokaryotes can, for example, be effected by any one of themethods described hereinabove. Yet other methods can also be applicablefor such characterization.

[0092] Characterizing polymorphic simple sequence repeat loci in thegenome prokaryotes according to the present invention can be effected,for example, by an allele specific oligonucleotide hybridization.

[0093] As further detailed hereinabove, allele specific oligonucleotidehybridization is based on the ability of nucleic acid sequences to formspecific hybrids with nucleic acids complementary thereof. When shortsequences of up to about 50 nucleotides are let to hybridize, one can,by appropriately selecting the hybridization conditions, control theeffectiveness of the hybridization to a degree which is sensitive tominute sequence alterations present between the hybridizing sequences,even to a degree of a single point mutation or mismatch. In other words,for sequences shorter than 50, preferably shorter than 40, morepreferably shorter than 30, most preferably in the range of 8 to 20nucleotides in length, hybridization conditions are providable, so as toallow only complete matching strands to hybridize, whereas as little asa single point mutation prevents such hybridization.

[0094] The hybridization conditions selected for allele specificoligonucleotide hybridization are very much dependent on the sequence ofthe hybridizing strands. These conditions are typically accomplished forspecific sequences by changing the temperature under which hybridizationtakes place. Since a single mismatch between hybridizing strands lowersthe melting temperature (Tm, the temperature in which 50% of the strandsare found as hybrids) thereof by about 2.5° C., controlling thehybridization temperature provides an effective means for controllingallele specific oligonucleotide hybridization. When a plurality ofhybridizations are carried out simultaneously, so that it is notpractical to control the temperature, control of the length of thehybridizing sequences is preferably exercised, to thereby bring theT_(m)s of all hybridizing pairs to a close range.

[0095] Thus, according to the present invention there is provided anallele specific oligonucleotide comprising a sequence of nucleotidesadapted for effectively hybridizing only with a specific simple sequencerepeat of a polymorphic simple sequence repeat locus in a genome of aprokaryote, under stringent allele specific oligonucleotidehybridization conditions of (i) a hybridization solution of 2×standardsodium citrate (SSC) and 0.1% sodium dodecyl sulfate (SDS); (ii) ahybridization temperature of from 42° C. to Tm −5° C. for 30 minutes toovernight, wherein Tm is estimated as 2×(the number of A plus Tresidues)+4×(the number of G plus C residues); and (iii) posthybridization washes with 0.75×SSC and 0.1% SDS at a temperature from42° C. to Tm −5° C.

[0096] Further according to the present invention there is provided aDNA chip comprising a surface and a plurality of allele specificoligonucleotides attached thereto. Each of the plurality of allelespecific oligonucleotides includes a sequence of nucleotides adapted foreffectively hybridizing only with a specific simple sequence repeat of apolymorphic simple sequence repeat locus in a genome of a prokaryote,under the stringent hybridization conditions described above.

[0097] In a preferred embodiment of the present invention, the sequenceof nucleotides selected for the allele specific oligonucleotidesaccording to the present invention is perfectly complementary to theirrespective specific simple sequence repeats.

[0098] As further detailed hereinabove, another widely accepted methodin molecular biology is the polymerase chain reaction (PCR) in whichprimers having the 3′ end thereof facing each other yet which arecomplementary to different strands of a double stranded nucleic acidmolecule are employed to provide for exponential amplification of aportion of the double stranded nucleic acid molecule defined by theboundaries of the primers by a cycled physical/enzymatic reaction ofthree substeps: primers annealing (physical), primers elongation(enzymatic) and primers denaturation (physical), effected by cycling thereaction temperature.

[0099] Since its introduction, numerous protocols have been developed inwhich the polymerase chain reaction has been successfully used fordetection of sequence polymorphisms, including repeats, e.g., triplerepeats, in the genome of human beings.

[0100] Thus, according to another aspect of the present invention thereis provided a pair of polymerase chain reaction primers having asequence adapted for exponential amplification of a polymorphic simplesequence repeat locus in a genome of a prokaryote. According to yetanother aspect of the present invention there is provided a primerhaving a sequence adapted for amplification of a polymorphic simplesequence repeat locus in a genome of a prokaryote. Such primers have alength of between 10 and 40 nucleotides and are typically flanking thepolymorphic site. Examples of such primers are provided in the Examplessection hereinunder. Such primers can be used to provide polymerasechain reaction products of a polymorphic length depending on thespecific genome, a portion thereof they are designed to amplify. Themorph of the product thus obtained can be identified by one of aplurality of protocols, including size separation such as gelelectrophoresis and sequence determination via DNA sequencing using forexample the dideoxy sequencing protocol, via allele specifichybridization as described above or via other methods, some of which arefurther detailed herein.

[0101] As further detailed hereinabove, the formation of heteroduplexescan be employed for detection of polymorphism in nucleic acids, e.g., aseffected by TGGE or DGGE. Thus, according to an embodiment of thepresent invention characterizing the at least one polymorphic simplesequence repeat locus in the genome of the prokaryote is effected by aheteroduplex hybridization reaction.

[0102] As further detailed hereinabove, single strand conformationalpolymorphism (SSCP) can be employed for detection of polymorphism innucleic acids. Thus, according to an embodiment of the present inventioncharacterizing the at least one polymorphic simple sequence repeat locusin the genome of the prokaryote is effected by single strandconformational polymorphism.

[0103] As further detailed hereinabove, restriction fragment lengthpolymorphism (RFLP) can be employed for detection of polymorphism innucleic acids. Thus, according to an embodiment of the present inventioncharacterizing the at least one polymorphic simple sequence repeat locusin the genome of the prokaryote is effected by restriction fragmentlength polymorphism.

[0104] According to a presently preferred embodiment of the presentinvention the polymorphic simple sequence repeat loci employed forclassification or typing using the method of the present invention arelocated is in a non-coding region of the genome of the prokaryotesanalyzed. Such a location is presently preferred because the degree towhich variability is permitted in such regions is to a much greaterextent higher in evolutionary terms. It is well known that while codingregion mutations are in most cases selected against, no such effectiveselection pressure is imposed on mutations present in non-codingregions.

[0105] As used herein in the specification and in the claims sectionbelow, the term “non-coding” refers to regions in the prokaryotic genomethat do not include direct information for the synthesis of proteins,i.e., are not a part of a translated sequence.

[0106] The polymorphic simple sequence repeat loci employed forclassification or typing using the method of the present invention aretypically 12 bp long or less and are composed of single, di-, tri-,tetra- penta- or hexanucleotide repeats. Thus, the number of repeats canrange, in most cases, from 2 to 12. Longer repeats are not excluded, yetare shown herein to be much less abundant in prokaryotes.

[0107] The invention described herein is taken not to be limited to anyprokaryote genus or species in particular. Indeed, as furtherExemplified hereinunder, any prokaryote tested so far was shown toabundantly include simple sequence repeats.

[0108] Thus, the prokaryote can be of the genus Escherichia, such asEscherichia coli. It can also be from the genera Aquifex, Treponema,Bacillus, Listeria or Mycobacterium, such as Aquifex aeolicus, Treponemapallidum, Bacillus subtilis, Listeria monocytogenes and Mycobacteriumtuberculosis, respectively. Yet it can also be from any other genera,such as, but not limited to, Haemophilius, Mycoplasma, Helicobacter,Methanococcus, Archaeoglobus or Synechocystis, for example, Haemophiliusinfluenzae, Mycoplasma pneumoniae, Helicobacter pylori, Methanococcusjannaschii, Archaeoglobus fulgidus or Synechocystis sp. PCC6803.

[0109] It will be appreciated that the scope of the present invention isnot limited to the polymorphism detection methods described herein, andthat other polymorphism detection methods can be effectively employed toimplement the invention.

[0110] Additional objects, advantages, and novel features of the presentinvention will become apparent to one ordinarily skilled in the art uponexamination of the following examples, which are not intended to belimiting. Additionally, each of the various embodiments and aspects ofthe present invention as delineated hereinabove and as claimed in theclaims section below finds experimental support in the followingexamples.

EXAMPLES

[0111] Reference is now made to the following examples, which togetherwith the above descriptions, illustrate the invention in a non limitingfashion.

[0112] Generally, the nomenclature used herein and the laboratoryprocedures in recombinant DNA technology described below are those wellknown and commonly employed in the art. Standard techniques are used forcloning, DNA and RNA isolation, amplification and purification.Generally enzymatic reactions involving DNA ligase, DNA polymerase,restriction endonucleases and the like are performed according to themanufacturers' specifications. These techniques and various othertechniques are generally performed according to Sambrook et al.,Molecular Cloning—A Laboratory Manual, Cold Spring Harbor Laboratory,Cold Spring Harbor, N.Y. (1989). The manual is hereinafter referred toas “Sambrook”. Other general references are provided throughout thisdocument. The procedures therein are believed to be well known in theart and are provided for the convenience of the reader. All theinformation contained therein is incorporated herein by reference.

Experimental Methods

[0113] Genomic Sequence Analysis

[0114] A DNA sequence analysis software in the programming language Cthat screens entire genomes for SSRs of 1-6 bp core length and reportscore sequence, number of repeats, and genomic position was used forgenomic sequence analysis This DNA sequence analysis software isavailable for downloading fromhttp:f/wwwtecinion.ac.ii/pub/supported/biotech.

[0115] The complete genomic sequence of Escherichia coli (Blattner etal. 1997) was obtained by ftp from GenBank, and screened for theoccurrence of SSRS, their core sequence, number of repeats, and genomiclocations. The identity of genes in the region and the locations of theSSRs with regard to coding and non-coding elements of such genes wasassessed by use of the BLAST DNA sequence analysis program (available athttp://www.ncbinlm.nih.gov/blast.

[0116] Complete genome sequences of Aquifex aeolicus (Deckert et al.1998), Treponema pallidum (Fraser et al. 1998), Haemophilus influenzae(Fleischmann et al. 1995), Mycoplasma pneumoniae (Himmelreich et al.1996), Bacillus subtilis (Kunst et. al. 1997), Helicobacter pylori 26695(Tomb et al. 1997), Methanococcus jannaschii (Bult et al. 1996),Archaeoglobus fulgidus (Klenk et al 1997), Synechocystis sp. PCC6803(Kaneko and Tabata 1997), and Saccharomyces cerevesiae chromosome VII(Tettelin et al. 1997) were obtained by ftp from GenBank.

[0117] Characterization of E. coli Microsatellites

[0118] PCR Amplification Primers

[0119] Nine SSR loci of E. coli were selected for detailed analysis. Theforward (F) and reverse (R) PCR primer sequences for the loci examinedare as follow: ycgW: F, 5′-GATTTTGCATATGAGTATATTAC-3′; (SEQ ID NO:1) R,5′-TTAATTACAGGATGTTCAGTC-3′. (SEQ ID NO:2) yaiN: F,5′-AATTTATCCGGTGAATGTGGT-3′; (SEQ ID NO:3) R,5′-CAACTTAATCTCGGGCTGAC-3′. (SEQ ID NO:4) YjiD: F,5′-TACATGGCTGATTATGCGG-3′; (SEQ ID NO:5) R, 5′-TCGCTATGAATATCTACTGAC-3′.(SEQ ID NO:6) aidB: F, 5′-GTCAGAGCAGATCCAGAATG-3′; (SEQ ID NO:7) R,5′-TCTACAGCAAATGAACAATG-3′. (SEQ ID NO:8) Mol_R_(—) 1: F,5′-GGTCATCAGGTGAAATAATC-3′; (SEQ ID NO:9) R,5′-CGTCCTGATAGATAAAGTGTC-3′. (SEQ ID NO:10) fts Z: F,5′-CAATGGAACTTACCAATGAC-3′; (SEQ ID NO:11) R,5′-TACCGCGAAGAATTCAACAC-3′. (SEQ ID NO:12) G1787979: F,5′-AGCATCAGCGCACAATGCAC-3′; (SEQ ID NO:13) {overscore (R)},5′-TGTATGCAGGCTGGCACAAC-3′. (SEQ ID NO:14) yia B: F,5′-ATAACGATCTCCATATCTAC-3′ (SEQ ID NO:15) R, 5′-CTCTATCAGCAACTTCTGCC-3′.(SEQ ID NO:16) his C: F, 5′-ATCCGCAGGATTTTCGCACC-3′ (SEQ ID NO:17) R,5′-TGCCAGCGTAAATCCGCAAC-3′. (SEQ ID NO:18)

[0120]E. coli Strains

[0121] Non-pathogenic and pathogenic strains (and substrains) of E. coliscreened for variation at SSR loci included K12 (DH5α, W4100, W3110), B(SR9b, SR9c), E (1, 7, 11, 18, 47, 52, 54, 63, 68, 69; see Ochman andSelander 1987), EHEC 0157:H7 (FEB, Rowe no. E304810, HER 1057, 1058,1261, 1265, 1266), EPEC (serotype O111 [Rowe no. E639616]), ETEC(serotype O78:H [Rowe no. E10407]). The K and B strains were obtainedfrom the microbiology laboratory collection of the Department of FoodEngineering and Biotechnology Technion—Israel Institute of Technology,Haifa 32000, Israel. The E strains were isolated by and obtained fromOchnan and Selander (1984). The EHEC O157:H7 HER strains were isolatedby and obtained from Ahmed (1987).

[0122] Preparalion of E. coli Template DNA for PCR Amplifcation

[0123] Cultures for DNA extraction were grown on LB agar plates for 24hours at 37° C. A large loop of colonies from the plate was transferredto a microcentrifuge tube containing 500 μl of TE buffer (pH 7.5), andvortexed thoroughly. Bacterial cells were lysed at 80° C. for 10minutes, and centrifuged for 10 minutes at 14,000 rpm (20,800×g). Thepellet was suspended in 100 μl TE, boiled for 5 minutes, and centrifugedat 14,000 rpm for 2 minutes. The supernatant was held at −20° C. untilused for PCR.

[0124] PCR Conditions

[0125] Five μl of DNA extract (about 50 ng), 2.5 μl 10×PCR buffer(Promega, 25 mM Mg⁺⁺ added), 0.2 μl of 25 mM dNTPs, 1.0 units Taqpolymerase (Promega), and 10 picomoles each of forward and reverseprimers were brought to a final volume of 25 μl with sterile doubledistilled H₂O. Mineral oil (15-20 μl) was added for PCR in a MJ Researchthermocycler without a heating cover. The cycling conditions for PCRconsisted of: denaturation at 95° C. for 5 minutes, followed by 5 cycles(1 minute at 95° C., 1 minute at T_(m) and 1 minutes at 72° C.), 20cycles (1 minute at 95° C., 1 minute at T_(m) −5° C., and 1 minute at72° C.), a final step of 7 minutes at 72° C., and cooling to roomtemperature. T_(m) stands for melting temperature and was calculatedusing the Generunner 3 or Oligo 4.1 software. The following T_(m)s wereemployed: For the ycgw locus, 57° C.; for the yiaN locus, 62° C.; forthe yjiD locus, 57° C.; for the OnaB locus, 56° C.; for the mol_RIlocus, 58° C.; for the ftsZ locus 57° C.; for the G1787979 locus, 62°C.; for the yiaB locus 58° C.; and for the hisC locus, 62° C.

[0126] End Labeling of Primers

[0127] Radioactive end labeling of primers was as follows: 2 μl (1 ng)primer DNA, 2 μl 10×T₄ kinase buffer (NEB), and 4 μL ³⁵S-γ-ATP (250mCurie, NEN), and 1 μl (10 units) T₄ DNA kinase (NEB) were brought to afinal volume of 20 μl with sterile double distilled H₂O. The contentswere mixed and held at 37° C. for 1 hour. The reaction was stopped byincubation at 70° C. for 10 minutes.

[0128] Radioactive PCR

[0129] For radioactive PCR, 0.5 μl of non-radioactive and 0.5 μl ofradioactive primer (together, 10 picomoles) were used under the PCRprotocol as described above.

[0130] Electrophoresis of PCR Products

[0131] To observe small size differences among PCR products,electrophoresis of radioactive products was carried out in a 5%denaturing TBE acrylamide gel The gels were dried (80° C. for 1.5hours), exposed to a phosphoimager cassette, and the results were readusing a Bas reader 100 (Fuji).

[0132] DNA Sequencing

[0133] The DNA of PCR products was sequenced by the dideoxy chaintermination method using an ABI automated sequencing machine (BiologicalServices, Weizmann Institute, Rehovot, Israel).

Experimental Results

[0134] SSRs in E. coli

[0135] A computerized scan of the genome of E. coli revealed many smallarrays of SSRs. Of 199,766 loci with simple sequence repeats (Table 1,top panel), 191,563 exhibited mononucleotide, 6,363 dinucleotide, 2,069trinucleotide, 48 tetranucleotide, and 2 hexanucleotide core repeatlength. These SSRs were distributed rather evenly throughout the genome(FIG. 1a). They are mostly located in non-coding areas, with theexception of those with 3 or 6 bp core sequences, which often werelocated in coding areas. Since the E. coli genome does not contain longnon-coding sequences, most SSR arrays in non-coding areas wereimmediately upstream or immediately downstream of a gene, often inlocations where variability presumably might affect gene expression(Kashi et al 1997, King et al. 1997, Kashi and Solier 1998). TABLE 1Numbers of loci exhibiting given numbers of copies of simple sequencerepeats for E. coli and 10 other genomes. Number of repeats Core repeatlength in nucleotides Per locus Mono- Di- Tri- Tetra- Penta- Hexa-Escherichia coli 3 163,476 7,099 2,364 51 — 3 4  42,963   462   64 1 — —5  13,880   28    2 — — 6  4,119    1 — — — — 7   1003 — — — — — 8   215 — — — — — 9    19 — — — — — 10     1 — — — — — Aquifex aeolicus 3 79,286 2,870   697 34 1 6 4  23,911   181   28 — — — 5  8,224    9    1— — — 6  2,176 —    1 — — — 7    460 — — — — — 8    63 — — — — — 9    8— — — — — 10     1 — — — — — Mycobacterium tuberculosis 3 138,899 7,8854,249 68 9 18 4  28,602   614   249 1 — 2 5  5,799   46   32 — — — 6   824 —    1 — — — 7    137 —    1 — — — 8     5 — — — — — 9     2 — —— — — Treponema pallidum 3  37,019 4,069   373 20 — 3 4  12,833   517   4 — — 2 5  4,564   49 — — — — 6  1,444    8 — — — — 7    436 — — — —— 8    155 — — — — — 9    46 — — — — — 10    16 — — — — — 11     6 — — —— — 13     2 — — — — — Haemophilius influenzae 3  78,444 1,951   801 6 34 4  27,927   83   12 — 1 1 5  10,399    5 — — — — 6  3,892 — — 1 — — 7 1,045 — — — — — 8    145 — — — — — 9    16 —    1 — — — 10     2 — — —— — 20 — — — 1 — — 21 — — — 1 — — 22 — — — 1 — — 23 — — — 4 — — 37 — — —1 — — Mycoplasma pneumoniae 3  37,389   671   345 9 1 3 4  12,113   24   8 — — — 5  4,730    2    1 — — — 6  1,455 — — — — — 7    360 —    1 —— — 8    30 — — — — — 9     6 — — — — — 10 — — — — — — 11 —    1 — — — —15     1 — — — — — 16     2 — — — — — Bacillus subtilis 3  16,336 6,5301,664 62 4 2 4  56,141   359   45 — — — 5  21,322   16 — — — — 6  8,009— — — — — 7  2,838 — — — — — 8    405 — — — — — 9    28 — — — — — 10    2 — — — — — 16     1 — — — — — Helicobacter pylori 3  80,371 2,334  645 32 2 2 4  35,391   119   29 1 — — 5  15,316    8    1 — — — 6 6,462    2 — — — 1 7  1,872 — — — — — 8    361    1    1 — — — 9    56   3 — — — — 10     6 — — — — — 11     2    3 — — — — 12     5 — — — — —13     6 — — — — — 14    10 — — — — — 15     7 — — — — — 16     3 — — —— — Methanococcus jannaschii 3  77,903 5,368 1,311 47 12 4 4  28,414  522   21 1 — — 5  11,663   32    1 — — — 6  5,069 — — — — — 7  1,469 —— — — — 8    96 — — — — — 9     8 — — — — — 10     1 — — — — — 24     1— — — — — Archaeoglobus fulgidus 3  83,655 4,544 1,107 24 4 2 4  25,167  392   21 — — — 5  8,444   10 — — — — 6  2,247 — — — — — 7    338 — — —— — 8    25 — — — — — 9     3 — — — — — 10 — — — — — — 15     1 — — — —— Synechocystis sp. PCC6803 3 170,790 2,241 1,516 41 6 4 4  58,969   88  83 1 — — 5  21,668    9   14 — — — 6  6,779 — — — — — 7  1,596 — — — —— 8    325 — — — — — 9    56 — — — — — 10    15 — — — — — 11     1 — — —— — Saccharomyces cerevesiae (chromosome 7) 3  43,848 2,252   642 43 1218 4  14,253   193   44 4 2 3 5  4,755   32   14 1 — 1 6  1,494   13  10 — 1 1 7    642    9    3 — — — 8    236    4    3 — — — 9    119   3    1 — — — 10    69    5 — — — — 11    49    4 — — — — 12    24 —   1 — — — 13    26    1    1 — — — 14    16    2 — — — — 15     8    2— — — — 16     5 — — — — — 17     3    2 — — — — 18     3 — — — — — 19    2 — — — — — 20     3 — — — — — 21     1 — — — — — 22     1 — — — — —23     3 — — — — — 24     2 — — — — — 25     1 — — — — — 26     1 — — —— — 27     1 — — — — —

[0136] Polymorphism of SSRs in E. coli

[0137] The numbers of core repeats at nine SSR loci among strains orsubstrains of E. coli were determined using the polymerase chainreaction (PCR, Table 2). Differences in the sizes of PCR products wereobserved only in long polyacrylamide gels. At two of these loci, PCRproducts harboring mononucleotide SSR arrays exhibited size differencesamong strains of E. coli (FIG. 2), exhibiting several alleles. Some ofthe pathogenic strains did not exhibit PCR amplification, likely due topoint mutations in the DNA sequence to which one or both of the primersanneal. No variation of SSR arrays was observed for core sequencelengths of two or more base pairs, although this finding may have beenaffected by the SSRs in question having been in coding regions and tothe limited number of loci tested. TABLE 2 Summary of allelism andlocation for tested SSR loci in E. coli. Coding or Genomic location,Strain and Number of Non-Coding Name of ORF or Substrain(s) Repeats CoreRegion Downstream ORF Mononucleotide SSRs K12:DH5α, K12:W3110 8 CNon-coding G1787407, ycgW BSR9b 4 ETEC 10 EPEC 8 K12:DH5α, K12:W3110 10G Non-coding G1786555, yaiN BS R9b 8 E:1 10 E:54, E:68 9 K12, B, EHEC,EPEC, ETEC 9 T Non-coding G1790782, YjiD Dinucleotide SSRs K12, B, EHEC,EPEC, ETEC, E:1-69 4.5 GT* Non-coding G1790630, aidB K12, B, EHEC, EPEC,ETEC, E:1-69 4.5 TC* Non-coding G1788433, molR_I Trinucleotide SSRs K12,B, EHEC, EPEC, ETEC, E:1-69 5 CGG* Coding G1786284,fts Z K12, B, EHEC,EPEC, ETEC, E:1-69 4 GGT* Non-coding G1787973, ORF unknownTetranucleotide SSRs K12, B, EHEC, EPEC, ETEC, E:1-69 5 ATTA* Non-codingG1789986, yiaB K12, B, EHEC, EPEC, ETEC, E:1-69 5 CTGG* Coding G1788332,hisC

[0138] Variation of mononucleotide SSRs among strains of E. coli wasconfirmed by sequencing the PCR product for the SSR and flanking domainsof all nine SSR loci. Variations of DNA sequence at polymorphic loci areshown in FIG. 3. Results of DNA sequencing confirmed that the SSR arrayswere hypervariable, exhibiting several alleles for copy number.Additionally, point mutations in sequences flanking the SSR arrays atboth loci were the results of expansion or deletion of tandemmononucleotide repeats. The SSR polymorphisms were located just upstreamof open reading frames. In addition, several point mutations involvingbase pair changes, but not additions or deletions, were observed amongthe DNA sequences of the respective strains.

[0139] Sizes of SSR Domains in Prokaryotes and Yeast

[0140] The data presented in Table I show that the total lengths ofparticular SSR tracts in E. coli are small, rarely exceeding 12.Multiplying the core repeat length by the number of repeats at a givenSSR locus reveals 163,476 loci of 3 bp in length, 42,963 of 4 bp, 13,880of 5 bp, 11,218 of 6 bp, 1,003 of 7 bp, 677 of 8 bp, 2,383 9 bp, 29 of10 bp, 116 of 12 bp, 2 of 15 bp, 1 of 16 bp, 1 of 18 bp, and none oflarger size. A total of 835 kb, or 18.1 percent of the E. coli genome,is comprised of SSR arrays (Table 3). TABLE 3 Genomic content of simplesequence DNA repeats for 11 surveyed genomes Genome SSR content SSRcontent/ Species size (Mb) (bp) genome size Mycoplasma pneumoniae 0.8203,536 0.254 Treponema pallidum 1.1 231,714 0.210 Saccharomycescerevesiae 1.1 247,720 0.225 (chromosome 7) Aquifex aeolicus 1.5 418,1290.278 Methanococcus jannaschii 1.6 496,853 0.311 Helicobacter pylori26695 1.7 536,928 0.316 Haemophilius influenzae 1.8 472,757 0.263Archaeoglobus fulgidus 2.2 451,024 0.205 Synecllocystis sp. PCC6803 3.6941,457 0.262 Mycobacteriun tuberculosis 3.7 661,830 0.178 Bacillussubtilis 4.2 510,198 0.121 Escherichia coli 4.6 834,618 0.181

[0141] The distribution of SSR array lengths (Table 1) and theproportion of the genome comprised of SSR arrays (Table 3) werecalculated for ten additional prokaryotes. As in E. coli, mononucleotideSSRs are predominant in all genomes. All prokaryote genomes examinedexhibited SSRs with core sequences ranging from one to six bp, thelargest core sequence for which was screened for.

[0142] In all genomes, the distribution of core repeat lengths wasskewed toward mononucleotide. All genomes exhibited a distribution oftotal array lengths that also tended toward low numbers, three tandemrepeats in most genomes. However, Bacillus subtilis exhibited arelatively small proportion of loci where mononucleotides where tandemlyrepeated three times.

[0143] The number of genomic sites exhibiting longer array lengthsdiffered among genomes. All genomes showed 99.99% of SSR arrays at 12 orfewer bp in length. Total SSR content of the genomes varied from 12.1 to31.6 percent, with most genomes clustered toward the middle of therange. SSRs tended to comprise a larger proportion of smaller genomesthan of larger ones (Table 3).

[0144] The genomic sequence of chromosome 7 of yeast, Saccharomycescerevesiae, whose length of DNA approximates that of a prokaryoticgenome, was analyzed to compare the SSR content of prokaryotes with thatof this simple eukaryote. S. cerevesiae exhibited mononucleotide arraysof greater length than observed in prokaryotes (Table 1), and a largerproportion of arrays with core repeat length greater then one. Thesefindings for number of SSRs of given size agree perfectly with those ofField and Wills (1998). Yeast has several hundred SSR arrays larger than12 bp. Although 99.7 of SSR arrays were of 12 or fewer bp in length inyeast, this proportion was lower than those observed in prokaryotes.

DNA Chip Design

[0145]FIG. 3b will now be employed to provide a diagrammatic example ofhow allele specific oligonucleotides for SSR sequences fixed onto a DNAchip, as shown at 10 in FIG. 4, can be used for strain identification inE. coli.

[0146]FIG. 3b shows a comparison of DNA sequences for five E. colistrains as indicated. Allelic variations at two key sites are shown inbold letters.

[0147] Allele-specific oligonucleotides are designed to distinguishamong E. coli strains, such that T_(m) values for all oligonucleotidesare substantially equal.

[0148] Table 4 below summarized the polymorphism evident from FIG. 5a.TABLE 4 Allele Strains exhibiting the allele G × 8* Bsr9c G × 9* wt 54,wt 68 G × 10* K12:DH5α, wt 1 TAAA* wt 54, wt 68 TTAA* K12:DH5α, Bsr9cTTAAA* wt 1

[0149] Thus, allele-specific oligonucleotides including the specifiedand possibly flanking sequences will be arrayed on the surface of a DNAchip as follows: $\begin{matrix}\lbrack {G \times 8} \rbrack & \lbrack {G \times 9} \rbrack & \lbrack {G \times 10} \rbrack & \lbrack\ldots\rbrack \\\lbrack {T\quad A\quad A\quad A} \rbrack & \lbrack {T\quad T\quad A\quad A} \rbrack & \lbrack {T\quad A\quad A\quad A} \rbrack & \lbrack\ldots\rbrack\end{matrix}\quad$

[0150] Genomic DNA or amplified DNA from an E. coli sample to be typedwill be hybridized onto the chip, and the presence or absence ofhybridization to each allele-specific oligonucleotide will be scored.Patterns of presence or absence of hybridization will bestrain-specific. For example: $\begin{matrix}\lbrack \begin{matrix}\quad & - &  \quad \rbrack\end{matrix}  & \lbrack \begin{matrix}\quad & - &  \quad \rbrack\end{matrix}  & \lbrack \begin{matrix}\quad & + &  \quad \rbrack\end{matrix}  & \lbrack\ldots\rbrack \\\lbrack \begin{matrix}\quad & - &  \quad \rbrack\end{matrix}  & \lbrack \begin{matrix}\quad & + &  \quad \rbrack\end{matrix}  & \lbrack \begin{matrix}\quad & - &  \quad \rbrack\end{matrix}  & \lbrack\ldots\rbrack\end{matrix}\quad$

[0151] is diagnostic for E. coli strain DH5α, and $\begin{matrix}\lbrack \begin{matrix}\quad & - &  \quad \rbrack\end{matrix}  & \lbrack \begin{matrix}\quad & + &  \quad \rbrack\end{matrix}  & \lbrack \begin{matrix}\quad & - &  \quad \rbrack\end{matrix}  & \lbrack\ldots\rbrack \\\lbrack \begin{matrix}\quad & + &  \quad \rbrack\end{matrix}  & \lbrack \begin{matrix}\quad & - &  \quad \rbrack\end{matrix}  & \lbrack \begin{matrix}\quad & - &  \quad \rbrack\end{matrix}  & \lbrack\ldots\rbrack\end{matrix}\quad$

[0152] is diagnostic for E. coli strain wt 54.

1 42 1 23 DNA Artificial sequence Synthetic oligonucleotide 1 gattttgcatatgagtatat tac 23 2 21 DNA Artificial sequence Synthetic oligonucleotide2 ttaattacag gatgttcagt c 21 3 21 DNA Artificial sequence Syntheticoligonucleotide 3 aatttatccg gtgaatgtgg t 21 4 20 DNA Artificialsequence Synthetic oligonucleotide 4 caacttaatc tcgggctgac 20 5 19 DNAArtificial sequence Synthetic oligonucleotide 5 tacatggctg attatgcgg 196 21 DNA Artificial sequence Synthetic oligonucleotide 6 tcgctatgaatatctactga c 21 7 20 DNA Artificial sequence Synthetic oligonucleotide 7gtcagagcag atccagaatg 20 8 20 DNA Artificial sequence Syntheticoligonucleotide 8 tctacagcaa atgaacaatg 20 9 20 DNA Artificial sequenceSynthetic oligonucleotide 9 ggtcatcagg tgaaataatc 20 10 21 DNAArtificial sequence Synthetic oligonucleotide 10 cgtcctgata gataaagtgt c21 11 20 DNA Artificial sequence Synthetic oligonucleotide 11 caatggaacttaccaatgac 20 12 20 DNA Artificial sequence Synthetic oligonucleotide 12taccgcgaag aattcaacac 20 13 20 DNA Artificial sequence Syntheticoligonucleotide 13 agcatcagcg cacaatgcac 20 14 20 DNA Artificialsequence Synthetic oligonucleotide 14 tgtatgcagg ctggcacaac 20 15 20 DNAArtificial sequence Synthetic oligonucleotide 15 ataacgatct ccatatctac20 16 20 DNA Artificial sequence Synthetic oligonucleotide 16 ctctatcagcaacttctgcc 20 17 20 DNA Artificial sequence Synthetic oligonucleotide 17atccgcagga ttttcgcacc 20 18 20 DNA Artificial sequence Syntheticoligonucleotide 18 tgccagcgta aatccgcaac 20 19 2 DNA Escherichia coli 19gt 2 20 2 DNA Escherichia coli 20 tc 2 21 3 DNA Escherichia coli 21 cgg3 22 3 DNA Escherichia coli 22 ggt 3 23 4 DNA Escherichia coli 23 atta 424 4 DNA Escherichia coli 24 ctgg 4 25 259 DNA Escherichia coli 25gttatgtctt atcccacggt atttaatatg gttcattagg atgtttattt cttgattttg 60catatgagta tattaccccc ccctcaaaaa aataaattaa ttaaaatgat ggcttatata 120aaataaaatt taaagcaagg aatctcaatg gatgttaaac aaaatgagat tttgtgaaag 180caataaatta ttgacttcgt tttagatttg tttagctata atgttataca ttcaaatgac 240tgaacatcct gtaattaaa 259 26 256 DNA Escherichia coli 26 ttgttatgtcttatcccacg gtatttaata tagttcattt ggatgttcat ttctttattt 60 tgcatatgagtatattaccc cttcaaaaaa taaattaatt aaaacgattg cttatataaa 120 acaaaatttaaagcaaggaa tctcaatgga tgttaaacaa aatgagattt agtgaaaaca 180 ataaattattcacttcgttt tagatttgtt tagctataat gttatacatt caaatgactg 240 aacatcctgtatttaa 256 27 257 DNA Escherichia coli 27 tttgttatgt cttatcccacggtatttaat atagttcatt tggatgttca tttctttatt 60 ttgcatatga gtatattaccccttcaaaaa ataaattaat taaaacgatt gcttatataa 120 aacaaaattt aaagcaaggaatctcaatgg atgttaaaca aaatgagatt tagtgaaaac 180 aataaattat tcacttcgttttagatttgt ttagctataa tgttatacat tcaaatgact 240 gaacatcctg taattaa 25728 263 DNA Escherichia coli misc_feature (15)..(15) Any nucleotide 28tctatgttct tatcnccacg gtntttaata tggttcatta ggatgtttat ttcttgattt 60tgcatatgag tatattaccc ccccccctca aaaaaataaa ttaattaaaa tgatggctta 120tatnaaatan aatttaaagc aaggantctc aatggatgtt aaacanaatg agattttgtg 180aangcnatnn attattgnct tcgttgtana tttgctnagc tataatgtta tncattcaaa 240tgactgaaca tcctgtnntt ana 263 29 264 DNA Artificial sequence Consensussequence for the different E-Coli strains. 29 nnnnntntnn ntnnncccacggtatttaat atngttcatt nggatgttna tttcttnatt 60 ttgcatatga gtatattannnnnncccctn naaaaaataa attaattaaa angatngctt 120 atataaaana aaatttaaagcaaggaatct caatggatgt taaacaaaat gagatttngt 180 gaaancaata aattattnacttcgttntag atttgnttag ctataatgtt atacattcaa 240 atgactgaac atcctgtaattaan 264 30 181 DNA Escherichia coli misc_feature (4)..(4) Anynucleotide 30 tttncccgga aaaaaatagg aaaggggggg gggctaatcg gcagggaaggccgccccgga 60 tagcgggcgg canaaggaat canaatttcc aggtcagacg ggctgcaagttgcagaccgt 120 taaaatcatc ggnnggggtg tcgtaccaca ctttacctgc cgtcagcccgagattaagtt 180 g 181 31 181 DNA Escherichia coli misc_feature (2)..(2)Any nucleotide 31 tnttnnncgg aaaaaaatng aaaggggggg gggctaatcg gcagggaaggccgccccgga 60 tagcgggcgg cagaaggaat cagaatttcc aggtcagacg ggctgcaagttgcagaccgt 120 taaaatcatc ggttggggtg tcgtaccaca ctttacctgc cgtcagcccgagattaagtt 180 g 181 32 177 DNA Escherichia coli misc_feature (4)..(4)Any nucleotide 32 tttnccggaa aaaaatngaa aggggggggc taatcggcag ggaaggccgccccggatagc 60 gggcggcaga aggaatcaga atttccaggt cagatgggct gcaagttgcagaccgttata 120 atcatcggtt ggggtgtcgt accacacttt acctgccgtc agcccgagattaagttg 177 33 182 DNA Escherichia coli misc_feature (2)..(2) Anynucleotide 33 tntncggaaa aaaanaggaa aggggggggg gctaatcggc agggaaggccgccccggata 60 gcgggcggca gaaggaatca gaatttccag gtcagacggg ctgcaagttgcagaccgtta 120 aaatcatcgg ttggggtgtc gtaccacact ttacctgccg tcagcccgagattaaagttt 180 gg 182 34 175 DNA Escherichia coli misc_feature (1)..(1)Any nucleotide 34 ncggaaaaaa atgaaagggg gggggctaat cggcagggaa ggccgccccggatagcgggc 60 ggcagaagga atcagaattt ccaggtcaga tgggctgcaa gttgcagaccgttataatca 120 tcggttgggg tgtcgtacca cactttacct gccgtcagcc cgagataaagtttgg 175 35 174 DNA Escherichia coli 35 cggaaaaaaa tgaaagggggggggctaatc ggcagggaag gccgccccgg atagcgggcg 60 gcagaaggaa tcagaatttccaggtcagat gggctgcaag ttgcagaccg ttataatcat 120 cggttggggt gtcgtaccacactttacctg ccgtcagccc gagataaagt ttgg 174 36 182 DNA Artificial sequenceConsesus sequence for the different E-Coli strains 36 nnnnnnnnnnnaaaaaanng aaannggggg gggctaatcg gcagggaagg ccgccccgga 60 tagcgggcggcagaaggaat cagaatttcc aggtcagang ggctgcaagt tgcagaccgt 120 tanaatcatcggttggggtg tcgtaccaca ctttacctgc cgtcagcccg agatnaagtt 180 ng 182 37 8DNA Escherichia coli 37 gggggggg 8 38 9 DNA Escherichia coli 38ggggggggg 9 39 10 DNA Escherichia coli 39 gggggggggg 10 40 4 DNAEscherichia coli 40 taaa 4 41 4 DNA Escherichia coli 41 ttaa 4 42 5 DNAEscherichia coli 42 ttaaa 5

What is claimed is:
 1. A method of classifying or typing a prokaryote toa class or a type, the method comprising the step of characterizing atleast one polymorphic simple sequence repeat locus in a genome of saidprokaryote and, based on a characterization of said polymorphic simplesequence repeat, classifying or typing said prokaryote to a class or atype.
 2. The method of claim 1, wherein said at least one polymorphicsimple sequence repeat locus is in a non-coding region of said genome.3. The method of claim 1, wherein said prokaryote is of the genusEscherichia.
 4. The method of claim 3, wherein said prokaryote isEscherichia coli.
 5. The method of claim 1, wherein said prokaryote isof a genus selected from the group consisting of Aquifex, Treponema,Bacillus, Listeria and Mycobacterium.
 6. The method of claim 5, whereinsaid prokaryote is selected from the group consisting of Aquifexaeolicus, Treponema pallidum, Bacillus subtilis, Listeria monocytogenesand Mycobacterium tuberculosis.
 7. The method of claim 1, wherein saidprokaryote is of a genus selected from the group consisting ofHaemophilius, Mycoplasma, Helicobacter, Methanococcus, Archaeoglobus andSynechocystis.
 8. The method of claim 7, wherein said prokaryote isselected from the group consisting of Haemophilius influenzae,Mycoplasma pneumoniae, Helicobacter pylori, Methanococcus jannaschii,Archaeoglobus fulgidus and Synechocystis sp PCC6803.
 9. The method ofclaim 1, wherein characterizing said at least one polymorphic simplesequence repeat locus in said genome of said prokaryote is effected byan allele specific oligonucleotide hybridization.
 10. The method ofclaim 9, wherein said allele specific oligonucleotide hybridization iseffected over a surface of DNA chip.
 11. The method of claim 1, whereincharacterizing said at least one polymorphic simple sequence repeatlocus in said genome of said prokaryote is effected by a polymerasechain reaction.
 12. The method of claim 1, wherein characterizing saidat least one polymorphic simple sequence repeat locus in said genome ofsaid prokaryote is effected by a sequencing reaction.
 13. The method ofclaim 1, wherein characterizing said at least one polymorphic simplesequence repeat locus in said genome of said prokaryote is effected by aheteroduplex hybridization reaction.
 14. The method of claim 1, whereincharacterizing said at least one polymorphic simple sequence repeatlocus in said genome of said prokaryote is effected by single strandconformational polymorphism.
 15. The method of claim 1, whereincharacterizing said at least one polymorphic simple sequence repeatlocus in said genome of said prokaryote is effected by restrictionfragment length polymorphism.
 16. A pair of polymerase chain reactionprimers having a sequence adapted for exponential amplification of apolymorphic simple sequence repeat locus in a genome of a prokaryote.17. A polymerase chain reaction product derived by amplifying a portionof said genome using the pair of polymerase chain reaction primers ofclaim
 16. 18. The pair of polymerase chain reaction primers of claim 16,wherein said polymorphic simple sequence locus is in a non-coding regionof said genome.
 19. The pair of polymerase chain reaction primers ofclaim 16, wherein said prokaryote is of the genus Escherichia.
 20. Thepair of polymerase chain reaction primers of claim 19, wherein saidprokaryote is Escherichia coli.
 21. The pair of polymerase chainreaction primers of claim 16, wherein said prokaryote is of a genusselected from the group consisting of Aquifex, Treponema, Bacillus,Listeria and Mycobacterium.
 22. The pair of polymerase chain reactionprimers of claim 21, wherein said prokaryote is selected from the groupconsisting of Aquifex aeolicus, Treponema pallidum, Bacillus subtilis,Listeria monocytogenes and Mycobacterium tuberculosis.
 23. The pair ofpolymerase chain reaction primers of claim 16, wherein said prokaryoteis of a genus selected from the group consisting of Haemophilius,Mycoplasma, Helicobacter, Methanococcus, Archaeoglobus andSynechocystis.
 24. The pair of polymerase chain reaction primers ofclaim 23, wherein said prokaryote is selected from the group consistingof Haemophilius influenzae, Mycoplasma pneumoniae, Helicobacter pylori,Methanococcus jannaschii, Archaeoglobus fulgidus and Synechocystis sp.PCC6803.
 25. An allele specific oligonucleotide comprising a sequence ofnucleotides adapted for effectively hybridizing only with a specificsimple sequence repeat of a polymorphic simple sequence repeat locus ina genome of a prokaryote, under stringent allele specificoligonucleotide hybridization sequence repeat of a polymorphic simplesequence repeat locus in a genome of a prokaryote, under stringentallele specific oligonucleotide hybridization conditions of (i) ahybridization solution of 2×standard sodium citrate (SSC) and 0.1%sodium dodecyl sulfate (SDS); (ii) a hybridization temperature of from42° C. to Tm −5° C. for 30 minutes to overnight, wherein Tm is estimatedas 2×(the number of A plus T residues)+4×(the number of G plus Cresidues)l and (iii) post hybridization washes with 0.75×SSC and 0.1%SDS at a temperature from 42° C. to Tm −5° C.
 26. The allele specificoligonucleotide of claim 25, wherein said sequence of nucleotides isperfectly complementary to said specific simple sequence repeat.
 27. Ahybrid of the allele specific oligonucleotide of claim 25 and saidspecific simple sequence repeat.
 28. The allele specific oligonucleotideof claim 25, wherein said polymorphic simple sequence locus is in anon-coding region of said genome.
 29. The allele specificoligonucleotide of claim 25, wherein said prokaryote is of the genusEscherichia.
 30. The allele specific oligonucleotide of claim 29,wherein said prokaryote is Escherichia coli.
 31. The allele specificoligonucleotide of claim 25, wherein said prokaryote is of a genusselected from the group consisting of Aquifex, Treponema, Bacillus,Listeria and Mycobacterium.
 32. The allele specific oligonucleotide ofclaim 31, wherein said prokaryote is selected from the group consistingof Aquifex aeolicus, Treponema pallidum, Bacillus subtilis, Listeriamonocytogenes and Mycobacterium tuberculosis.
 33. The allele specificoligonucleotide of claim 25, wherein said prokaryote is of a genusselected from the group consisting of Haemophilius, Mycoplasma,Helicobacter, Methanococcus, Archaeoglobus and Synechocystis.
 34. Theallele specific oligonucleotide of claim 33, wherein said prokaryote isselected from the group consisting of Haemophilius influenzae,Mycoplasma pneumoniae, Helicobacter pylori, Methanococcus jannaschii,Archaeoglobus fulgidus and Synechocystis sp PCC6803.
 35. A primer havinga sequence adapted for amplification of a polymorphic simple sequencerepeat locus in a genome of a prokaryote.
 36. The primer of claim 35,wherein said polymorphic simple sequence locus is in a non-coding regionof said genome.
 37. The primer of claim 35, wherein said prokaryote isof the genus Escherichia.
 38. The primer of claim 37, wherein saidprokaryote is Escherichia coli.
 39. The primer of claim 35, wherein saidprokaryote is of a genus selected from the group consisting of Aquifex,Treponema, Bacillus, Listeria and Mycobacterium.
 40. The primer of claim39, wherein said prokaryote is selected from the group consisting ofAquifex aeolicus, Treponema pallidum, Bacillus subtilis, Listeriamonocytogenes and Mycobacterium tuberculosis.
 41. The primer of claim35, wherein said prokaryote is of a genus selected from the groupconsisting of Haemophilius, Mycoplasma, Helicobacter, Methanococcus,Archaeoglobus and Synechocystis.
 42. The primer of claim 41, whereinsaid prokaryote is selected from the group consisting of Haemophiliusinfluenzae, Mycoplasma pneumoniae, Helicobacter pylori, Methanococcusjannaschii, Archaeoglobus fulgidus and Synechocystis sp PCC6803.
 43. ADNA chip comprising a surface and a plurality of allele specificoligonucleotides attached thereto, each of said plurality of allelespecific oligonucleotides including a sequence of nucleotides adaptedfor effectively hybridizing only with a specific simple sequence repeatof a polymorphic simple sequence repeat locus in a genome of aprokaryote, under stringent allele specific oligonucleotidehybridization conditions of (i) a hybridization solution of 2×standardsodium citrate (SSC) and 0.1% sodium dodecyl sulfate (SDS); (ii) ahybridization temperature of from 42° C. to Tm −5° C. for 30 minutes toovernight, wherein Tm is estimated as 2×(the number of A plus Tresidues)+4×(the number of G plus C residues); and (iii) posthybridization washes with 0.75×SSC and 0.1% SDS at a temperature from 42°C. to Tm −5° C.
 44. The DNA chip of claim 43, wherein said sequence ofnucleotides is perfectly complementary to said specific simple sequencerepeat.
 45. The DNA chip of claim 43, wherein said polymorphic simplesequence locus is in a non-coding region of said genome.
 46. The DNAchip of claim 43, wherein said prokaryote is of the genus Escherichia.47. The DNA chip of claim 46, wherein said prokaryote is Escherichiacoli.
 48. The DNA chip of claim 43, wherein said prokaryote is of agenus selected from the group consisting of Aquifex, Treponema,Bacillus, Listeria and Mycobacterium.
 49. The DNA chip of claim 48,wherein said prokaryote is selected from the group consisting of Aquifexaeolicus, Treponema pallidum, Bacillus subtilis, Listeria monocytogenesand Mycobacterium tuberculosis.
 50. The DNA chip of claim 43, whereinsaid prokaryote is of a genus selected from the group consisting ofHaemophilius, Mycoplasma, Helicobacter, Methanococcus, Archaeoglobus andSynechocystis.
 51. The DNA chip of claim 50, wherein said prokaryote isselected from the group consisting of Haemophilius influenzae,Mycoplasma pneumoniae, Helicobacter pylori, Methanococcus jannaschii,Archaeoglobus fulgidus and Synechocystis sp PCC6803.